Polynomial multiplication on embedded vector architectures

نویسندگان

چکیده

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms an active area of research, this article contributes to line work: Firstly, we present memory-efficiency performance improvements the Toom-Cook/Karatsuba multiplication strategy. Secondly, provide implementations those Arm® Cortex®-M4 CPU, as well newer Cortex-M55 processor, first M-profile core implementing Vector Extension (MVE), also known Helium™ technology. We implement Number Theoretic Transform (NTT) processor. show that despite being singleissue, in-order offering only 8 vector registers compared 32 A-profile SIMD architectures like Neon™ technology Scalable (SVE), by careful register management instruction scheduling, can obtain 3× 5× improvement over already highly optimized Cortex-M4, while maintaining low energy profile necessary use in embedded market. Finally, real-world application integrate our techniques post-quantum key-encapsulation mechanism Saber

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallel...

متن کامل

Performance of an embedded optical vector matrix multiplication processor architecture

An embedded architecture of optical vector matrix multiplier (OVMM) is presented. The embedded architecture is aimed at optimising the data flow of vector matrix multiplier (VMM) to promote its performance. Data dependence is discussed when the OVMM is connected to a cluster system. A simulator is built to analyse the performance according to the architecture. According to the simulation, Amdah...

متن کامل

Enhanced Montgomery Multiplication on DSP Architectures for Embedded Public-Key Cryptosystems

Montgomery’s algorithm is a popular technique to speed up modular multiplications in public-key cryptosystems. This paper tackles the efficient support of modular exponentiation on inexpensive circuitry for embedded security services and proposes a variant of the finely integrated product scanning (FIPS) algorithm that is targeted to digital signal processors. The general approach improves on t...

متن کامل

Fast Matrix Multiplication Algorithms on Mimd Architectures

Sequential fast matrix multiplication algorithms of Strassen and Winograd are studied; the complexity bound given by Strassen is improved. These algorithms are parallelized on MIMD distributed memory architectures of ring and torus topologies; a generalization to a hyper-torus is also given. Complexity and efficiency are analyzed and good asymptotic behaviour is proved. These new parallel algor...

متن کامل

Algebraic adjoint of the polynomials-polynomial matrix multiplication

This paper deals with a result concerning the algebraic dual of the linear mapping defined by the multiplication of polynomial vectors by a given polynomial matrix over a commutative field

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IACR transactions on cryptographic hardware and embedded systems

سال: 2021

ISSN: ['2569-2925']

DOI: https://doi.org/10.46586/tches.v2022.i1.482-505